This project focuses on the asset map item for the CFA NDoCH event; it consists of a web application to visualize various assets within the Sacramento area. The data is sourced from various open data portals.
Data processing and maps are created using Jupyter Notebook due to its ability to visualize results effectively and efficiently. Listed below are installation instructions and more about Jupyter and the Python programming language.
The CFA-NDoCH instructions are shown below and specify that open data sources should be used to visualize resources available to the Sacramento community. This notebook is intended as starting point to visualize such data for further development.
Asset mapping is an integral part of empowered community building that is based on understanding the strengths and needs of diverse communities. First, use publicly available information about your locale to give a sense of the landscape and demographics. Next, research the location and availability of government programs (e.g. county health and human services offices), community based organizations (like resource centers, food banks, and legal aid clinics) or other resources that are vital to your community. Visually documenting the landscape can help identify what might make your community more equitable and accessible to all who live there.
This notebook starts with a tutorial using Python mapping tools as a prototype, then develops a map for the Sacramento area. Open data sources are listed below and will be added to with additional development.
Tutorials
CA Geoportal Data
SACOG Data
City of Sacramento Data
This notebook will require some basic understanding of the Python programming language, Jupyter platform and data analysis concepts.
Jupyter is a powerful collaborative tool which is open-source and light-weight. It provides all the tools necessary to run data analysis, visualization, statistics and data science out of the box. In addition, it has gain acceptance from industry and academia for collaborating on projects and publishing work.
Jupyter is a combination of text and code with the programming run-time built into the platform so there is no need to install additional software. The text is in the markdown file format (similar to HTML), and code in several languages. It is organized by cells which can consist of either text or code; placed together, they can be sent as a single document to share/publish work.
Notebooks are organized by cells, which mainly consist of text (in markdown) and code (Python). It operations like a hybrid between MS Word and Excel file; whereas the entire file is like a document, the cells operate like a spreadsheet. For getting started, feel free to scroll down each cell and navigate around the cells for a quick tour. Here is a breakdown of how to view/edit cells:
This notebook will require some Python programming, which is widely used and gain enough traction to be taught in high school and AP Computer Science courses.
Jupyter supports several different languages (R, Scala and Julia); however, Python is the most popular of them and can be used for other tasks, primarily data science and web programming.
If you are new to Jupyter, then please review the links below:
If you are new to programming or Python, then please review the links below:
If you are new to programming or Markdown, then please review the links below:
# 01 - load modules into notebook
# data analysis module
import pandas as pd
# numerical data modules
import numpy as np
import scipy
# data visualization module
import matplotlib.pyplot as plt
# adjust plot settings
%matplotlib inline
# data visualization module
# https://seaborn.pydata.org/
# import seaborn as sns; sns.set(color_codes=True)
# geospatial modules
from shapely.geometry import Point, Polygon
from shapely.geometry import shape, LineString, Point
import geopandas as gpd
import geojsonio
from descartes import PolygonPatch
import fiona
# geospatial and geojson modules
import folium
from folium.plugins import MarkerCluster
import os
import json
# install pip package in current kernel; run only for initial install:
# https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/
# import sys
# !{sys.executable} -m pip install --upgrade pip
# !{sys.executable} -m pip install seaborn==0.9.0
# 02.00 - data functions
# function to read csv file
# https://stackoverflow.com/questions/32400867/pandas-read-csv-from-url/41880513#41880513
def read_data(path):
df = pd.read_csv(path)
return(df)
# function to output csv file
def output_result(df, filepath):
df.to_csv(filepath)
# function to show table info
def data_profile(df, msg):
# pass in variable into string
# https://stackoverflow.com/questions/2960772/how-do-i-put-a-variable-inside-a-string
print('*** Table Info: %s ***' % msg, '\n')
print(df.info(), '\n')
print('*** Table Info: Table Dimensions ***', '\n')
print(df.shape, '\n')
# function to show unique value for given column
def show_unique(df, col):
# pass in variable into string
# https://stackoverflow.com/questions/2960772/how-do-i-put-a-variable-inside-a-string
print('*** Unique Values: (%s) ***' % col, '\n')
print(df[col].unique(), '\n')
# function to output summary stats
def summary_stats(df, col):
# pass in variable into string
# https://stackoverflow.com/questions/2960772/how-do-i-put-a-variable-inside-a-string
print('*** Summary Stats: (%s) ***' % col, '\n')
print(df[col].describe(), '\n')
# print(col.describe())
# function to rename columns
# https://www.geeksforgeeks.org/how-to-rename-columns-in-pandas-dataframe/
def rename_col(df, old_col, new_col):
df.rename(
columns={old_col:new_col},
inplace=True
)
return df
# function convert col to numeric type
# reference: https://stackoverflow.com/questions/47333227/pandas-valueerror-cannot-convert-float-nan-to-integer
def convert_num(df, col):
# convert type
df[col] = pd.to_numeric(
df[col],
errors='coerce'
)
return(df)
# convert string to datetime
# reference: https://stackoverflow.com/questions/32888124/pandas-out-of-bounds-nanosecond-timestamp-after-offset-rollforward-plus-adding-a
def convert_date(df, col):
# convert type
df[col] = pd.to_datetime(
df[col],
infer_datetime_format=True,
errors = 'coerce'
)
return(df)
# function convert col to string type
def convert_str(df, col):
# convert type
df[col].astype(str)
return(df)
# 02.01 - data import
# sf open data portal - sfpd reports (2003-2018)
# https://data.sfgov.org/Public-Safety/Police-Department-Incident-Reports-Historical-2003/tmnf-yvry
# df_data = read_data("data/sfpd_report_2003-18.csv")
# note: reduce original file (500mb) by subset first 10k rows and replace file
# https://datacarpentry.org/python-ecology-lesson/03-index-slice-subset/index.html
# df_data = df_data[0:10000]
# output_result(df_data, "data/sfpd_report_2003-18.csv")
# read in reduced file after processing steps above
# https://data.sfgov.org/Public-Safety/Police-Department-Incident-Reports-Historical-2003/tmnf-yvry
df_sfpd = read_data("data/sfpd_report_2003-18.csv")
# ca geoportal - education dataset (2019-20)
# https://gis.data.ca.gov/datasets/CDEGIS::california-schools-2019-20
df_school = read_data("data/ca_school_2019-20.csv")
# 02.02 - data processing
# subset dataset by row values; for example, schools by count
# https://stackoverflow.com/questions/17071871/how-to-select-rows-from-a-dataframe-based-on-column-values
df_school_sac = df_school[
df_school['CountyName'].str.contains('Sacramento')
]
df_school_amador = df_school[
df_school['CountyName'].str.contains('Amador')
]
df_school_placer = df_school[
df_school['CountyName'].str.contains('Placer')
]
df_school_yolo = df_school[
df_school['CountyName'].str.contains('Yolo')
]
df_school_yuba = df_school[
df_school['CountyName'].str.contains('Yuba')
]
# todo: process geojson
# https://opendata.arcgis.com/datasets/f7f818b0aa7a415192eaf66f192bc9cc_0.geojson
# df_school_geojson = read_data("data/ca_school_2019-20.geojson")
# data profile data after import
data_profile(df_sfpd, 'SFPD Reports (2003-18)')
data_profile(df_school, 'CA Schools (2019-20)')
data_profile(df_school_sac, 'CA Schools: Sacramento County (2019-20)')
data_profile(df_school_amador, 'CA Schools: Amador County (2019-20)')
data_profile(df_school_placer, 'CA Schools: Placer County (2019-20)')
data_profile(df_school_yolo, 'CA Schools: Yolo County (2019-20)')
data_profile(df_school_yuba, 'CA Schools: Yuba County (2019-20)')
# 03.00 - map functions
# tutorial - folium plot with cluster markers
# https://python-visualization.github.io/folium/quickstart.html
# https://www.jpytr.com/post/analysinggeographicdatawithfolium/
# https://github.com/python-visualization/folium/blob/master/examples/MarkerCluster.ipynb
# function to plot coordinates with cluster markers
def plot_cluster(col1, col2, icon_color, cluster_name, map):
# zip lat/long into list
location = list(zip(col1, col2))
# icon = [folium.Icon(color='red') for _ in range(len(location_sac))]
icon = [folium.Icon(color=icon_color) for _ in range(len(location))]
# plot clusters
cluster = MarkerCluster(
# name='CA Schools: Sac County, 2019-20 (Red)',
name=cluster_name,
control=True,
locations=location,
icons=icon
)
map.add_child(cluster)
return(map)
# function to add map controls and title
# https://stackoverflow.com/questions/37466683/create-a-legend-on-a-folium-map
# https://stackoverflow.com/questions/61928013/adding-a-title-or-text-to-a-folium-map
def plot_map(loc_title, file_path, map):
# add legend and layer control
map.add_child(folium.map.LayerControl())
# add map title
loc = loc_title
title_html = '''
<h3 align="center" style="font-size:16px"><b>{}</b></h3>
'''.format(loc)
map.get_root().html.add_child(folium.Element(title_html))
# display and save map
display(map)
map.save(file_path)
# function to plot geojson
# https://medium.com/@rohanguptha.bompally/python-data-visualization-using-folium-and-geopandas-981857948f02
def plot_geojson(json_file, layer_title, style, map):
# note: add json to map; however, geojson function only reads json
# https://shallowsky.com/blog/mapping/folium-with-shapefiles.html
folium.GeoJson(
json_file,
name=layer_title,
control=True,
style_function=lambda x:style
).add_to(map)
return(map)
# function to import geojson, then convert to json
# https://github.com/lesley2958/twilio-geospatial
def geojson2json(file_path):
# import geojson and view data source
# https://raw.githubusercontent.com/lesley2958/twilio-geospatial/master/data/states.geojson
sacog_lihm_geojson = gpd.read_file(file_path)
# print(sacog_lihm_geojson.head(5), '\n')
# convert to json
sacog_lihm_json = sacog_lihm_geojson.to_json()
# print(sacog_lihm_json)
return(sacog_lihm_json)
# sacog - lihm areas (2016)
# https://data.sacog.org/datasets/d37cca2c798b48b9966b62e4bb1f380d_0?selectedAttribute=COUNTYFP10
sacog_lihm_json = geojson2json('data/sacog_lihm_areas_2016.geojson')
# city of sac - existing bike facilities (2018)
# http://data.cityofsacramento.org/datasets/15f8e048d9ad4442a3e12b6182bcd4f2_1?geometry=-121.899%2C38.464%2C-121.028%2C38.652
citysac_bike_fac_json = geojson2json('data/citysac_bike_fac_2018.geojson')
# city of sac - bikeshare opportunity areas (2016)
# http://data.cityofsacramento.org/datasets/8439c4e091a2434aafee1cf888b061f0_0?geometry=-122.330%2C38.373%2C-120.589%2C38.749
citysac_bikeshare_json = geojson2json('data/citysac_bikeshare_areas_2016.geojson')
# sacog - hfta-scs data (2020)
# http://data.sacog.org/datasets/high-frequency-transit-area-mtp-scs-2020
sacog_hfta_json = geojson2json('data/sacog_htfa_2020.geojson')
# sacog - hq-transit, sb375 data (2017)
# http://data.sacog.org/datasets/high-quality-transit-2036?geometry=-123.179%2C38.303%2C-119.697%2C39.053
sacog_sb375_json = geojson2json('data/sacog_sb375_2017.geojson')
# sacog - calenviroscreen3.0, top-25 tracks
# http://data.sacog.org/datasets/calenviroscreen-3-0-top-25-tracts?geometry=-123.212%2C38.343%2C-119.729%2C39.093
sacog_calenv_json = geojson2json('data/sacog_calenv_top25.geojson')
# sacog - air pollution, pm2.5 planning areas (2018)
# http://data.sacog.org/datasets/sacramento-pm-2-5-planning-area-
sacog_pm25_json = geojson2json('data/sacog_pm25_2018.geojson')
# 03.01 - map plot: sfpd tutorial
# note: module based on tutorial below
# https://blog.dominodatalab.com/creating-interactive-crime-maps-with-folium/
# sf open data portal - sfpd reports (2003-2018)
# https://data.sfgov.org/Public-Safety/Police-Department-Incident-Reports-Historical-2003/tmnf-yvry
# set origin
latlong_sf = (37.76, -122.45)
# create map
map_sfpd = folium.Map(location=latlong_sf, zoom_start=12)
# call function to plot coordinates with cluster markers
map_sfpd = plot_cluster(
df_sfpd.Y,
df_sfpd.X,
'red',
'SFPD: Crime Reports, 2003-2018 (Red)',
map_sfpd
)
# call function to add map controls and title
plot_map(
'Crime Report Map: City of San Francisco (2003-2018)',
'maps/03.01_sfpd_reports.html',
map_sfpd
)
# 03.02 - map plot: sac area with lihm areas and schools
# ca geoportal - education dataset (2019-20)
# https://gis.data.ca.gov/datasets/CDEGIS::california-schools-2019-20
# https://opendata.arcgis.com/datasets/f7f818b0aa7a415192eaf66f192bc9cc_0.geojson
# sacog - lihm shapefile (2016)
# https://data.sacog.org/datasets/d37cca2c798b48b9966b62e4bb1f380d_0
# set origin
# https://www.latlong.net/place/sacramento-ca-usa-1079.html
latlong_sac = (38.575764, -121.478851)
# create map
map_sac_lihm_school = folium.Map(location=latlong_sac, zoom_start=8)
# plot sacog lihm data
style_sac_lihm = {
'line_opacity': 0.5
}
# call function to plot geojson
map_sac_lihm_school = plot_geojson(
sacog_lihm_json,
'SACOG: Low Income High Minority (LIHM) Communities, 2016 (Blue)',
style_sac_lihm,
map_sac_lihm_school
)
# call function to plot coordinates with cluster markers
map_sac_lihm_school = plot_cluster(
df_school_sac.Latitude,
df_school_sac.Longitude,
'red',
'CA Schools: Sac County, 2019-20 (Red)',
map_sac_lihm_school
)
map_sac_lihm_school = plot_cluster(
df_school_amador.Latitude,
df_school_amador.Longitude,
'green',
'CA Schools: Amador County, 2019-20 (Green)',
map_sac_lihm_school
)
map_sac_lihm_school = plot_cluster(
df_school_placer.Latitude,
df_school_placer.Longitude,
'blue',
'CA Schools: Placer County, 2019-20 (Blue)',
map_sac_lihm_school
)
map_sac_lihm_school = plot_cluster(
df_school_yolo.Latitude,
df_school_yolo.Longitude,
'orange',
'CA Schools: Yolo County, 2019-20 (Oranage)',
map_sac_lihm_school
)
map_sac_lihm_school = plot_cluster(
df_school_yuba.Latitude,
df_school_yuba.Longitude,
'purple',
'CA Schools: Yuba County, 2019-20 (Purple)',
map_sac_lihm_school
)
# call function to add map controls and title
plot_map(
'Sacramento Area Asset Map: LIHM Communities and Schools',
'maps/03.02_sac_lihm_school.html',
map_sac_lihm_school
)
# 03.03 - map plot: city of sac, lihm areas and bike facilities
# set origin
# https://www.latlong.net/place/sacramento-ca-usa-1079.html
latlong_sac = (38.575764, -121.478851)
# create map
map_sac_lihm_bike = folium.Map(location=latlong_sac, zoom_start=12)
# plot sacog lihm data
style_sac_lihm = {
'fillColor': '#ff4500',
'color': '#ff4500'
}
# call function to plot geojson
map_sac_lihm_bike = plot_geojson(
sacog_lihm_json,
'SACOG: Low Income High Minority (LIHM) Communities, 2016 (Orange)',
style_sac_lihm,
map_sac_lihm_bike
)
style_bike_fac = {
'fillColor': '#008000',
'color': '#008000'
}
map_sac_lihm_bike = plot_geojson(
citysac_bike_fac_json,
'City of Sac: Bike Facilities, 2018 (Green)',
style_bike_fac,
map_sac_lihm_bike
)
style_bikeshare = {
'fillColor': '#9370db',
'color': '#9370db'
}
map_sac_lihm_bike = plot_geojson(
citysac_bikeshare_json,
'City of Sac: Bikeshare Opportunity Areas, 2016 (Purple)',
style_bikeshare,
map_sac_lihm_bike
)
# call function to add map controls and title
plot_map(
'City Sacramento Asset Map: LIHM Communities and Bike Facilities',
'maps/03.03_sac_lihm_bike.html',
map_sac_lihm_bike
)
# 03.04 - map plot: city of sac, lihm areas and public transit
# set origin
# https://www.latlong.net/place/sacramento-ca-usa-1079.html
latlong_sac = (38.575764, -121.478851)
# create map
map_sac_lihm_transit = folium.Map(location=latlong_sac, zoom_start=12)
# plot sacog lihm data
style_sac_lihm = {
'fillColor': '#ff4500',
'color': '#ff4500'
}
# call function to plot geojson
map_sac_lihm_transit = plot_geojson(
sacog_lihm_json,
'SACOG: Low Income High Minority (LIHM) Communities, 2016 (Orange)',
style_sac_lihm,
map_sac_lihm_transit
)
style_sac_hfta = {
'fillColor': '#9370db',
'color': '#9370db'
}
map_sac_lihm_transit = plot_geojson(
sacog_hfta_json,
'SACOG: High Frequency Transit Areas (HFTAs), 2020 (Purple)',
style_sac_hfta,
map_sac_lihm_transit
)
style_sac_sb375 = {
'fillColor': '#008000',
'color': '#008000'
}
map_sac_lihm_transit = plot_geojson(
sacog_sb375_json,
'SACOG: High Quality Transit (SB375), 2017 (Green)',
style_sac_sb375,
map_sac_lihm_transit
)
# call function to add map controls and title
plot_map(
'City Sacramento Asset Map: LIHM Communities and Transit',
'maps/03.04_sac_lihm_transit.html',
map_sac_lihm_transit
)
# 03.05 - map plot: city of sac, lihm areas and pollution
# set origin
# https://www.latlong.net/place/sacramento-ca-usa-1079.html
latlong_sac = (38.575764, -121.478851)
# create map
map_sac_lihm_pollution = folium.Map(location=latlong_sac, zoom_start=12)
# plot sacog lihm data
style_sac_lihm = {
'fillColor': '#ff4500',
'color': '#ff4500'
}
# call function to plot geojson
map_sac_lihm_pollution = plot_geojson(
sacog_lihm_json,
'SACOG: Low Income High Minority (LIHM) Communities, 2016 (Orange)',
style_sac_lihm,
map_sac_lihm_pollution
)
style_sac_pm25 = {
'fillColor': '#9370db',
'color': '#9370db'
}
map_sac_lihm_pollution = plot_geojson(
sacog_pm25_json,
'SACOG: Air Pollution PM 2.5 Planning Areas, 2018 (Purple)',
style_sac_pm25,
map_sac_lihm_pollution
)
style_sac_calenv = {
'fillColor': '#008000',
'color': '#008000'
}
map_sac_lihm_pollution = plot_geojson(
sacog_calenv_json,
'SACOG: CalEnviroScreen3.0, Top 25% Tracks (Green)',
style_sac_calenv,
map_sac_lihm_pollution
)
# call function to add map controls and title
plot_map(
'City Sacramento Asset Map: LIHM Communities and Pollution Levels',
'maps/03.05_sac_lihm_pollution.html',
map_sac_lihm_pollution
)